Search CORE

11 research outputs found

Neural-Augmented Static Analysis of Android Communication

Author: Abadi Martín
Allamanis Miltiadis
Allamanis Miltiadis
Elish Karim O
Information
Kim Yoon
Kremenek Ted
Octeau Damien
van der Maaten Laurens
Yang Wei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/09/2018
Field of study

We address the problem of discovering communication links between applications in the popular Android mobile operating system, an important problem for security and privacy in Android. Any scalable static analysis in this complex setting is bound to produce an excessive amount of false-positives, rendering it impractical. To improve precision, we propose to augment static analysis with a trained neural-network model that estimates the probability that a communication link truly exists. We describe a neural-network architecture that encodes abstractions of communicating objects in two applications and estimates the probability with which a link indeed exists. At the heart of our architecture are type-directed encoders (TDE), a general framework for elegantly constructing encoders of a compound data type by recursively composing encoders for its constituent types. We evaluate our approach on a large corpus of Android applications, and demonstrate that it achieves very high accuracy. Further, we conduct thorough interpretability studies to understand the internals of the learned neural networks.Comment: Appears in Proceedings of the 2018 ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE

arXiv.org e-Print Archive

Crossref

The Impact of Generic Data Structures: Decoding the Role of Lists in the Linux Kernel

Author: Dyer Robert
Engler Dawson R.
Foster Jeffrey S
Fowler M.
Huang Wei
Julia
Kremenek Ted
Lu Shan
O'Callahan Robert
Padioleau Yoann
Papi Matthew M
Parnin Chris
Saha Suman
Shankar Umesh
Stroustrup Bjarne
Vakilian Mohsen
Volanschi Nic
von Rhein Alexander
Weitz Konstantin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/09/2020
Field of study

International audienceThe increasing adoption of the Linux kernel has been sustained by a large and constant maintenance effort, performed by a wide and heterogeneous base of contributors. One important problem that maintainers face in any code base is the rapid understanding of complex data structures. The Linux kernel is written in the C language, which enables the definition of arbitrarily uninformative datatypes, via the use of casts and pointer arithmetic, of which doubly linked lists are a prominent example. In this paper, we explore the advantages and disadvantages of such lists, for expressivity, for code understanding, and for code reliability. Based on our observations, we have developed a toolset that includes inference of descriptive list types and a tool for list visualization. Our tools identify more than 10,000 list fields and variables in recent Linux kernel releases and succeeds in typing 90%. We show how these tools could have been used to detect previously fixed bugs and identify 6 new ones

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations

Author: Dawson Engler
Ted Kremenek
Publication venue: Springer
Publication date: 01/01/2003
Field of study

This paper explores z-ranking, a technique to rank error reports emitted by static program checking analysis tools. Such tools often use approximate analysis schemes, leading to false error reports. These reports can easily render the error checker useless by hiding real errors amidst the false, and by potentially causing the tool to be discarded as irrelevant. Empirically, all tools that effectively find errors have false positive rates that can easily reach 30--100%. Z-ranking employs a simple statistical model to rank those error messages most likely to be true errors over those that are least likely. This paper demonstrates that z-ranking applies to a range of program checking problems and that it performs up to an order of magnitude better than randomized ranking. Further, it has transformed previously unusable analysis tools into e#ective program error finders

CiteSeerX

Crossref

a memory model for static analysis of c programs

Author: Kremenek Ted
Xu Zhongxing
Zhang Jian
Publication venue
Publication date: 01/01/2010
Field of study

European Association of Software Science and Technology (EASST)Automatic bug finding with static analysis requires precise tracking of different memory object values. This paper describes a memory modeling method for static analysis of C programs. It is particularly suitable for precise path-sensitive analyses, e.g., symbolic execution. It can handle almost all kinds of C expressions, including arbitrary levels of pointer dereferences, pointer arithmetic, composite array and struct data types, arbitrary type casts, dynamic memory allocation, etc. It maps aliased lvalue expressions to the identical object without extra alias analysis. The model has been implemented in the Clang static analyzer and enhanced the analyzer a lot by enabling it to have precise value tracking ability. © 2010 Springer-Verlag

Institute Of Software, Chinese Academy Of Sciences

MECA: an Extensible, Expressive System and Language for Statically Checking Security Properties

Author: Dawson Engler
Junfeng Yang
Ted Kremenek
Yichen Xie
Publication venue: ACM Press
Publication date: 01/01/2003
Field of study

This paper describes a system and annotation language, MECA, for checking security rules. MECA is expressive and designed for checking real systems. It provides a variety of practical constructs to effectively annotate large bodies of code. For example, it allows programmers to write programmatic annotators that automatically annotate large bodies of source code. As another example, it lets programmers use general predicates to determine if an annotation is applied; we have used this ability to easily handle kernel backdoors and other false-positive inducing constructs. Once code is annotated, MECA propagates annotations aggressively, allowing a single manual annotation to derive many additional annotations (e.g., over one hundred in our experiments) freeing programmers from the heavy manual effort required by most past systems. MECA is effective. Our most thorough case study was a user-pointer checker that used 75 annotations to check thousands of declarations in millions of lines of code in the Linux system. It found over forty errors, many of which were serious, while only having eight false positives

CiteSeerX

From uncertainty to belief: Inferring the specification within

Author: Andrew Ng
Dawson Engler
Godmar Back
Paul Twohey
Ted Kremenek
Publication venue: USENIX Association
Publication date: 01/01/2006
Field of study

Automatic tools for finding software errors require a set of specifications before they can check code: if they do not know what to check, they cannot find bugs. This paper presents a novel framework based on factor graphs for automatically inferring specifications directly from programs. The key strength of the approach is that it can incorporate many disparate sources of evidence, allowing us to squeeze significantly more information from our observations than previously published techniques. We illustrate the strengths of our approach by applying it to the problem of inferring what functions in C programs allocate and release resources. We evaluated its effectiveness on five codebases: SDL, OpenSSH, GIMP, and the OS kernels for Linux and Mac OS X (XNU). For each codebase, starting with zero initially provided annotations, we observed an inferred annotation accuracy of 80-90%, with often near perfect accuracy for functions called as little as five times. Many of the inferred allocator and deallocator functions are functions for which we both lack the implementation and are rarely called — in some cases functions with at most one or two callsites. Finally, with the inferred annotations we quickly found both missing and incorrect properties in a specification used by a commercial static bug-finding tool.

CiteSeerX

Learning a strategy for adapting a program analysis via bayesian optimisation

Author: Brochu Eric
Kremenek Ted
Rasmussen Carl Edward
Snoek Jasper
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Correlation exploitation in error ranking

Author: Cowell R. G.
Dawson Engler
Good P.
Junfeng Yang
Ken Ashcraft
Shankar U.
Ted Kremenek
Wagner D.
Yedidia J. S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Abstraction refinement guided by a learnt probabilistic model

Author: Allamanis Miltiadis
Chris
Clarke E.
Erd˝os Paul
Getoor Lise
Gupta Anubhav
Hongseok Yang
Kremenek Ted
Lee Joohyung
Marques-Silva Jo˜ao
Radu Grigore
Sato Taisuke
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/11/2015
Field of study

The core challenge in designing an effective static program analysis is to find a good program abstraction -- one that retains only details relevant to a given query. In this paper, we present a new approach for automatically finding such an abstraction. Our approach uses a pessimistic strategy, which can optionally use guidance from a probabilistic model. Our approach applies to parametric static analyses implemented in Datalog, and is based on counterexample-guided abstraction refinement. For each untried abstraction, our probabilistic model provides a probability of success, while the size of the abstraction provides an estimate of its cost in terms of analysis time. Combining these two metrics, probability and cost, our refinement algorithm picks an optimal abstraction. Our probabilistic model is a variant of the Erdos--Renyi random graph model, and it is tunable by what we call hyperparameters. We present a method to learn good values for these hyperparameters, by observing past runs of the analysis on an existing codebase. We evaluate our approach on an object sensitive pointer analysis for Java programs, with two client analyses (PolySite and Downcast)

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Kent Academic Repository